Introduction to GIS and mapping in R
Mapping raster and vector data
Load packages
Overview
Combine the terra, sf, and ggplot2 packages to work with geospatial data and create beautiful maps.
Learning objectives
- Plot and analyze spatial data to address a research question
- Apply different mapping functions to plot spatial data
- Describe the difference between vector and raster features
- Adjust
ggplot2settings to customize maps
Mapping predicted vs. observed dates of lilac phenophases
In the following exercises, we will visualize spatial concordance between model-predicted and observed dates of lilac leaf out, which is a phenophase of lilac in which new leaves appear. A phenophase is an observable stage of an animal’s or plant’s life cycle that can be defined by a start and end point. All data were derived from the USA National Phenology Network, an organization that hosts phenology models and maintains a database of volunteer-contributed observations of phenology for ecologically and economically important species of plants and insects in the USA.
Exercise 1: Mapping predicted lilac leaf out
We import raster data for model predictions of lilac leaf out for 2018 using the rast() function in terra.
lilacModel_r <- rast("../data/lilac_2018_model.tiff")Check out attributes of the raster, such as it’s class, coordinate reference system (crs), class, extent, and range of values correspond to the day in which lilac leaf out was predicted to occur in 2018.
class(lilacModel_r)[1] "SpatRaster"
attr(,"package")
[1] "terra"
lilacModel_rclass : SpatRaster
dimensions : 1228, 2606, 1 (nrow, ncol, nlyr)
resolution : 0.02246419, 0.02107085 (x, y)
extent : -125.0208, -66.47917, 24.0625, 49.9375 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269)
source : lilac_2018_model.tiff
name : layer
min value : 6
max value : 223
We’ll go over three approaches for mapping this raster.
(1a) terra: plot()
The raster can be visualized using the plot() function in terra. This function can produce some decent looking maps, but not as nice as ggplot2 maps (in my opinion). Try it out below.
# Plot the raster
plot(lilacModel_r)The color palette can changed using the col option. The default is rev(grDevices::terrain.colors(50)). Can you find a palette that you like better? (e.g., check out other palettes in viridis by typing viridis::).
# Continuous prediction using a different color palette
plot(lilacModel_r, col = rev(viridis::viridis(50)))Raster data can be depicted in bins by specifying “interval” for the type option. The number of intervals is defined with the breaks option. Apply different numbers of breaks to see how it affects your map.
# Binned predictions
plot(lilacModel_r, type="interval", breaks = 8, col = rev(viridis::viridis(8)))# The legend falls off the map!You can also classify the bins yourself using the classify() function. Try changing the classes object in the code below to see how it impacts the map.
# How should you define your bins?
classes <- seq(0, 230, 20)
# Now classify the raster
lilacModel_r2 <- classify(lilacModel_r, classes)
# Plot the results as before, except add `type="classes"` to the function
plot(lilacModel_r2, type="classes", col = rev(viridis::viridis(length(classes))))Question 1: How does predicted lilac leaf out vary across the contiguous U.S.? Any ideas what may explain these patterns?
Answer: Leaf out occurs earlier in southern latitudes because warmer temperatures there drive faster development.
The plot() function has several other options for customizing maps. Below, I added a title, put axis labels on 3 sides.
plot(lilacModel_r,
type="interval",
breaks = 8,
col = rev(viridis::viridis(8)),
plg=list(title="Predicted\nLeafout DOY", cex=0.9), pax=list(side=1:3))(1b) Plotting rasters in ggplot2
Most people are more familiar with tidyverse packages and functions than those in terra or its predecessor raster. Thus, mapping rasters with ggplot2 is often preferred over plot(). We’ll cover two ggplot2 functions: (1) geom_raster(), which requires data to be in data frame format, and (2) geom_spatraster(), which can work directly with SpatRaster objects. The latter function is part of the relatively new tidyterra package. For both examples, days of the year will be binned into months to facilitate comparisons with observed lilac leaf out dates (Exercise 2, below). This task requires defining a factor variable, as described below.
ggplot2: geom_raster()
First, convert the raster into a data frame using as.data.frame() in terra. Set the xy option to TRUE to retain coordinate information (x and y columns), and rename the data column from layer to leaf_out_doy.
# Convert raster data to a data frame
lilacModel_df <- lilacModel_r %>%
as.data.frame(xy = TRUE) %>%
rename("leaf_out_doy" = layer)
head(lilacModel_df) x y leaf_out_doy
71692 -95.15470 49.35805 121
71693 -95.13223 49.35805 121
71694 -95.10977 49.35805 121
71695 -95.08731 49.35805 121
71696 -95.06484 49.35805 125
71697 -95.04238 49.35805 125
Next, we create a data frame that specifies the month for each day of year, which will be used for defining factor levels and for plotting in ggplot2.
# Data frame needed to create pretty plots
catgories_df <- data.frame(
# Day and month of year
leaf_out_doy = 1:365,
leaf_out_month = cut_interval(1:365, 12)) %>%
# Bin dates by month and re-format labels to remove brackets, parentheses, etc.
mutate(leaf_out_month = format(as.Date(
leaf_out_doy, origin = "2018-01-01"), "%b"),
leaf_out_month = gsub("\\(|\\]|\\[", "", leaf_out_month)) %>%
mutate(leaf_out_month = gsub(",", "-", leaf_out_month))
# Convert month to a factor so they're in the right order on plots
# Factor levels are ordered by day of the year
catgories_df$leaf_out_month <- factor(
catgories_df$leaf_out_month,
levels = unique(catgories_df$leaf_out_month[order(catgories_df$leaf_out_doy)]))Join the data frame with factor levels to lilacModel_df and examine its structure.
# Join month labels to lilac predictions
lilacModel_df <- left_join(lilacModel_df, catgories_df, by = "leaf_out_doy")
head(lilacModel_df) x y leaf_out_doy leaf_out_month
1 -95.15470 49.35805 121 May
2 -95.13223 49.35805 121 May
3 -95.10977 49.35805 121 May
4 -95.08731 49.35805 121 May
5 -95.06484 49.35805 125 May
6 -95.04238 49.35805 125 May
Next, we create some custom labels for the plots, and define a pretty color palette.
# Create a vector of labels to show in plot legend
labs <- unique(catgories_df$leaf_out_month)
# Create a color palette (vector) for the plot
# Classic cyclic palette in "ggthemes" package
pal <- ggthemes_data[["tableau"]][["color-palettes"]][["regular"]]$`Classic Cyclic`
pal <- pal$value # Vector of hex codes for colors
pal <- pal[1:12] # Keep only 12 colors
names(pal) <- unique(catgories_df$leaf_out_month) # Name colors by monthFinally, the lilacModel_df data frame can be plotted using geom_raster().
# Create a map using ggplot2
ggplot() +
geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) This map looks okay but can definitely be improved. For example, we could add a plot title, custom legend title, and maybe get rid of the default gray background, grid lines, and axes labels. First add the titles.
# Create a map using ggplot2
ggplot() +
geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") Now customize the map theme. If you’re producing multiple plots, it’s more concise to define your custom theme a single time (DRY principle = Don’t Repeat Yourself).
# Custom theme
my_theme <- theme(#panel.grid= element_blank(),
panel.background = element_blank(),
panel.border = element_rect(fill = NA),
axis.title = element_blank(),
#axis.text = element_blank(),
#axis.ticks = element_blank(),
plot.title = element_text(face = "bold", size = 16),
legend.key = element_rect(fill = "white"),
legend.title = element_text(face="bold", size = 12),
legend.text = element_text(size = 11),
legend.key.height = unit(0.0025, "cm"),
legend.key.width = unit(1, "cm"))Apply the custom theme to the plot.
# Create a map using ggplot2
ggplot() +
geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
my_theme # Our custom themeChallenge 1: Change at least two plot features to align with your preferences. For example, you could remove the plot axes (lat and long), change the legend attributes, font types and sizes, etc. (my_theme), or change the color palette (pal).
FYI: The ggsn package has functions to add a scale bar and north arrow. Check it out sometime!
Adding boundary data to maps
The map is more informative if we add state boundaries. Some R packages provide these data, including rnaturalearth and USAboundaries ; however, installing them can sometimes be problematic. To avoid troubleshooting, I saved USA boundary data as shapefiles. First, we need to use st_use_s2(FALSE) because sf assumes the world is round, and uses great circles for straight lines. This will cause issues for cropping boundaries to the same extent as our raster data.
sf_use_s2(FALSE)Spherical geometry (s2) switched off
Now import the shapefile using the st_read() function and plot the first column.
# US states feature
states <- st_read("../data/states.shp")Reading layer `states' from data source
`/home/randre/Documents/Code/CASCADIA_R_Intro_to_GIS_2024/data/states.shp'
using driver `ESRI Shapefile'
Simple feature collection with 52 features and 12 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS: WGS 84
# Plot the first data column
plot(states[1])Question 2: What is the class and geometry type of states? Describe two other differences between states and lilacModel_r.
Answer: the class is both sf and data.frame. The sf portion contains geographic information such as geometry, extent, and CRS, whereas the data frame has associated data, such as state names and abbreviations.
We’re going to run into issues due to differing coordinate reference systems (CRS) of states and lilacModel_r. You can see these differences by using the crs() function on both objects.
# CRS of states
crs(states)[1] "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]"
CRS of raster data.
# CRS of lilacModel_r
crs(lilacModel_r)[1] "GEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101004,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]]"
Change the CRS of states to match the raster data using st_transform. Check out the results.
# Transform coordinates of states
states <- st_transform(states, crs(lilacModel_r))
crs(states)[1] "GEOGCRS[\"NAD83\",\n DATUM[\"North American Datum 1983\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101004,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4269]]"
Next, use st_crop() to crop your states feature to the same extent as your raster, which is the contiguous U.S. Check out the results. What happens if you skip the st_transform() step and run this code?
# Crop to this extent
states_c <- st_crop(states, lilacModel_r)although coordinates are longitude/latitude, st_intersection assumes that they
are planar
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
# Plot the result
plot(states_c[1])Challenge 2: Can you think of a dplyr approach to obtaining data only for CONUS? Hint: remove rows with undesired states (note that Puerto Rico [PR] is in the data).
states_c2 <- filter(states, !stusps %in% c("AK", "HI", "PR"))
plot(states_c2[1])Finally, use geom_sf() to add an sf object to a plot - i.e., the states feature.
# Add an 'sf' object to them map
ggplot() +
geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
geom_sf(data = states_c, fill = NA) +
scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
my_themeQuestion 3: What happens when you don’t enter NA for the fill option for the states_c feature? Why?
Answer: the states features covers up the raster data because it’s multi-polygon data. The default fill color is gray.
You could just continue to use fill = NA, or convert the geometry of states_c to MULTILINESTRING using the st_cast() function.
# Convert geometry from polygon to multilinestring
states_c %>%
st_cast("MULTILINESTRING") Simple feature collection with 49 features and 12 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
Geodetic CRS: NAD83
First 10 features:
statefp statens affgeod geoid stusps name lsad aland
1 06 01779778 0400000US06 06 CA California 00 403671196038
2 55 01779806 0400000US55 55 WI Wisconsin 00 140292246684
3 16 01779783 0400000US16 16 ID Idaho 00 214049923496
4 27 00662849 0400000US27 27 MN Minnesota 00 206232157570
5 19 01779785 0400000US19 19 IA Iowa 00 144659688848
6 29 01779791 0400000US29 29 MO Missouri 00 178052563675
7 24 01714934 0400000US24 24 MD Maryland 00 25151895765
8 41 01155107 0400000US41 41 OR Oregon 00 248628426864
9 26 01779789 0400000US26 26 MI Michigan 00 146614604273
10 30 00767982 0400000US30 30 MT Montana 00 376973673895
awater stat_nm stt_bbr jrsdct_ geometry
1 20294133830 California CA state MULTILINESTRING ((-118.594 ...
2 29343721650 Wisconsin WI state MULTILINESTRING ((-86.93428...
3 2391577745 Idaho ID state MULTILINESTRING ((-117.243 ...
4 18949864226 Minnesota MN state MULTILINESTRING ((-97.22904...
5 1085996889 Iowa IA state MULTILINESTRING ((-96.62187...
6 2487215790 Missouri MO state MULTILINESTRING ((-95.76564...
7 6979171386 Maryland MD state MULTILINESTRING ((-76.04622...
8 6170953359 Oregon OR state MULTILINESTRING ((-124.5525...
9 103872203398 Michigan MI state MULTILINESTRING ((-84.61622...
10 3866689601 Montana MT state MULTILINESTRING ((-116.0492...
You can add labels to your map using the geom_sf_text() or geom_sf_label() functions.
# Add an 'sf' object to them map
ggplot() +
geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
geom_sf(data = states_c, fill = NA) +
geom_sf_text(data = states_c, aes(label = stusps), size = 2) +
scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal) +
ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
my_themeggplot2: geom_raster()
An identical map can be produced using the geom_spatraster() function in tidyterra, which works directly with SpatRaster objects. First, create a copy of your raster and convert the data into categorical (factor) format for plotting. The categories_df data frame is used to define factor levels.
# Define factor levels of the raster using the "catgories_df" data frame
lilacModel_r2 <- lilacModel_r
levels(lilacModel_r2) <- catgories_dfLook at both rasters and notice the data formats.
# Original raster (numeric data - days of the year)
lilacModel_rclass : SpatRaster
dimensions : 1228, 2606, 1 (nrow, ncol, nlyr)
resolution : 0.02246419, 0.02107085 (x, y)
extent : -125.0208, -66.47917, 24.0625, 49.9375 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269)
source : lilac_2018_model.tiff
name : layer
min value : 6
max value : 223
# New raster (categorical data - months of the year)
lilacModel_r2class : SpatRaster
dimensions : 1228, 2606, 1 (nrow, ncol, nlyr)
resolution : 0.02246419, 0.02107085 (x, y)
extent : -125.0208, -66.47917, 24.0625, 49.9375 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat NAD83 (EPSG:4269)
source : lilac_2018_model.tiff
categories : leaf_out_month
name : leaf_out_month
min value : Jan
max value : Aug
Create the same map as before but instead use geom_spatraster() with your categorical raster instead of geom_raster().
# Plot produced using "geom_spatraster"
conus_mod <- ggplot() +
geom_spatraster(data = lilacModel_r2, aes(fill = leaf_out_month)) +
geom_sf(data = states_c, fill = NA) +
geom_sf_text(data = states_c, aes(label = stusps), size = 2) +
scale_fill_manual(name = "Predicted First\nLeaf Out", label = labs, values = pal, na.value = "white") +
ggtitle("Month of Predicted First Leaf Out in Lilac in 2018") +
my_theme
conus_modSave your map using ggsave() and see how it looks. If you’re not happy with it, adjust the image dimensions of change the plot theme (i.e., my_theme).
ggsave(conus_mod, filename = "../plots/Lilac_leaf_out_model_2018.png",
dpi = 300, units = c('in'), width = 10, height = 6)Exercise 2: Adding phenometric data to the map
We can add more layers to our map to ask certain questions, such as: How do model-predicted dates of lilac leaf out compared with observed dates? Below, we import phenometric data for observed lilac leaf out dates for 2018. These data were collected by volunteers and submitted to the USA National Phenology Network’s database
# Phenometric data
lilacObs_df <- read.csv("../data/lilac_2018_obs.csv")Take a look at the data dimensions. How many observations are there?
dim(lilacObs_df)[1] 182 25
There are 77 unique sites in the dataset. The goal is to add the location for each observation to our previous map.
length(unique(lilacObs_df$site_id))[1] 77
We only need data in three columns in the phenometric data: longitude, latitude, and first_yes_doy. The first_yes_doy column specifies which day of year that lilacs first had leaf out at the site. We’ll subset the columns using select() and take a look at the first several rows of data.
lilacObs_df <- lilacObs_df %>%
filter(state != "ON") %>% # Remove an observation from Canada
dplyr::select("longitude", "latitude", "first_yes_doy") %>%
rename("leaf_out_doy" = "first_yes_doy") %>%
left_join(catgories_df, by = "leaf_out_doy") Take a look at the first several rows of data.
head(lilacObs_df) longitude latitude leaf_out_doy leaf_out_month
1 -87.89300 43.09780 121 May
2 -87.89300 43.09780 121 May
3 -92.75200 36.52450 53 Feb
4 -83.05769 43.72269 118 Apr
5 -76.55337 38.88813 57 Feb
6 -76.55337 38.88813 57 Feb
For plotting purposes, we convert the phenometric data to a simple feature (sf) object using the st_as_sf() function. You must specify which columns contain the coordinates, and also define the CRS, which should be the same as our other objects.
# Convert data frame to a simple feature
lilacObs_sf <- lilacObs_df %>%
st_as_sf(coords = c("longitude", "latitude"), crs = crs(states))Compare the geometry of lilacObs_sf and states.
# Lilac observations feature
lilacObs_sf$geometryGeometry set for 181 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -122.7566 ymin: 35.01538 xmax: -70.89148 ymax: 47.95269
Geodetic CRS: NAD83
First 5 geometries:
POINT (-87.893 43.0978)
POINT (-87.893 43.0978)
POINT (-92.752 36.5245)
POINT (-83.05769 43.72269)
POINT (-76.55337 38.88813)
# States feature
states$geometryGeometry set for 52 features
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS: NAD83
First 5 geometries:
MULTIPOLYGON (((-118.594 33.4672, -118.4848 33....
MULTIPOLYGON (((-86.93428 45.42115, -86.83575 4...
MULTIPOLYGON (((-117.243 44.39097, -117.2151 44...
MULTIPOLYGON (((-97.22904 49.00069, -96.93096 4...
MULTIPOLYGON (((-96.62187 42.77925, -96.57794 4...
Question 4: How does the geometry of the lilac observations (lilacObs_sf) differ from the states feature? Why?
Answer: the lilac observations have a POINT geometry whereas the states data have a MULTIPOLYGON geometry. Observations are from single locations, whereas state boundaries cover many points over the U.S.
Now let’s add the phenometric data so we can visualize spatial concordance between model-predicted and observed dates for first leaf out. Sites for each observation are plotted using their coordinate information and colored according to values in the leaf_out_date column.
# Add sites to the map
conus_modVobs <- ggplot() +
#geom_spatraster(data = lilacModel_r, aes(fill = leaf_out_month)) +
geom_raster(data = lilacModel_df, aes(x = x, y = y, fill = leaf_out_month)) +
geom_sf(data = states_c, fill=NA) +
geom_sf(data = lilacObs_sf, aes(color = leaf_out_month), size = 3, show.legend = FALSE) +
geom_sf(data = lilacObs_sf, color = "black", size = 3, shape = 1) + # black outlines around sites
scale_fill_manual(name = "Month of First\nLeaf Out", label = labs, values = pal, drop = FALSE) +
scale_color_manual(name = "Month of First\nLeaf Out", label = labs, values = pal, drop = FALSE) +
ggtitle("Predicted vs. Observed 1st Leaf Out in Lilac in 2018") +
my_theme
conus_modVobsSave your map using ggsave() and see how it looks. If you’re not happy with it, adjust the image dimensions of change the plot theme (i.e., my_theme).
ggsave(conus_modVobs, filename = "../plots/Lilac_leaf_out_modelVsObs_2018.png",
dpi = 300, units = c('in'), width = 10, height = 6)Question 5: What does your map tell you about predicted vs. observed first leaf out in lilac? Can you think of a better way to compare the two datasets on a map?
Answer: Our map indicates that predicted dates of lilac leaf out in 2018 did not always correspond to observed dates. In some cases, the model seems to under-predict dates (leaf out too early) whereas in others it over-predicts (leaf out too late). Notice that two sites had dates that were much later than predicted dates. In this case, perhaps the volunteers failed to notice that lilac leaf out had already occurred, although other explanations are possible.
Combining different maps
Let’s say we’re interested in results for a particular region of CONUS, such as a state. The map can be zoomed with the coord_sf() function, in which the x- and y-limits of the region are defined. In the commented coord_sf() function below, enter your own limits (xmin, xmax, ymin, ymax), but make sure they’re within the U.S.
# Map for specific region
# The expand = FALSE ensures that limits are taken exactly as provided
conus_modVobs +
# Define x- and y-limits
coord_sf(xlim = c(-90.5542, -82.3047),
ylim = c(41.6311, 47.5739),
expand = FALSE) We can also add county boundaries by importing the shapefile in the data folder (./data/counties.shp). Remember to change the CRS as we did above for states. Also, feel free to convert the geometry to MULTILINESTRING.
# Simple feature ("sf") for US counties
counties <- st_read("../data/counties.shp")Reading layer `counties' from data source
`/home/randre/Documents/Code/CASCADIA_R_Intro_to_GIS_2024/data/counties.shp'
using driver `ESRI Shapefile'
Simple feature collection with 3234 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -179.1489 ymin: -14.5487 xmax: 179.7785 ymax: 71.36516
Geodetic CRS: NAD83
counties <- counties %>%
st_transform(crs = crs(states)) %>%
st_cast("MULTILINESTRING") Add the counties to your map below, remove the legend and title, and make any other desired changes. Give the map a name so it can be used for the next step.
# Map for region of interest
# Lilac leaf out observations are added again so boundary lines don't cover them
zoom_modVobs <- conus_modVobs +
geom_sf(data = counties, fill = NA, color = "gray30", size = 0.25) + # Add counties
geom_sf(data = lilacObs_sf, aes(color = leaf_out_month), size = 3, show.legend = FALSE) +
coord_sf(xlim = c(-90.5542, -82.3047),
ylim = c(41.6311, 47.5739),
expand = FALSE) + # Take limits exactly as defined
theme(legend.position = "none",
plot.title = element_blank())
zoom_modVobsFinally, we can combine maps for CONUS and your region of interest using the patchwork package. The \ operator indicates that the map for CONUS goes on top of the region map, the width setting indicates that the CONUS is map is 2X bigger, and the legend is “collected” to ensure it stays in the correct position (on right-hand side).
# Combine maps using patchwork (ToDo - this still doesn't run [Roger])
both_modVobs <- conus_modVobs / zoom_modVobs +
plot_layout(guides = "collect", widths = c(2, 1))
both_modVobsSave your map and see what you think. Feel free to adjust the theme, plot dimensions, or anything else to improve the map.
ggsave(both_modVobs, filename = "../plots/Lilac_leaf_out_modelVsObs_2018_2.png",
dpi = 300, units = c('in'), width = 9, height = 9)Note: Other options for combining plots include functions in the ggpubr [ggarrange()] and cowplot [plot_grid()] packages.
Exercise 3: Extract and analyze spatial data
There are numerous functions in terra for extracting, manipulating, and analyzing raster data. Here, we’ll use the extract function to extract model predictions (predicted day of year) from the precise location where lilac observations (observed day of year) were collected. This allows us to analyze rather than just visualize the two datasets.
Extract the predictions, convert results to a data frame, and rename the data column from layer to leaf_out_doy.
# Extract model predictions for each observation location
lilacObs_extr <- extract(lilacModel_r, lilacObs_sf) %>%
as.data.frame() %>%
rename("leaf_out_doy" = "layer") %>%
as.data.frame() # Next, we can estimate the extent to which the model under-predicts vs. over-predicts the date of lilac leaf out by subtracting the leaf_out_doy columns.
differences <- lilacObs_extr$leaf_out_doy - lilacObs_sf$leaf_out_doy Create a histogram of differences using the hist() function in base R.
hist(differences, breaks = seq(-250, 50, 25))Now calculate some simple summary statistics of the differences.
# Mean
mean(differences)[1] -13.71271
# Range
range(differences)[1] -237 48
# Median
median(differences)[1] 1
Question 6: What information do your simple analyses provide that can’t be seen on a map? Overall, what do the results say about the performance of the model for lilac leaf out?
Answer: On average, predicted dates were 13.71 days earlier (range = -237 to 48 days) than observed dates, which suggests that the model is under-predicting the date of lilac leaf out. However, the extreme outliers are likely due to observer error, as we noted above. The model actually seems to perform pretty well if we reduce the impact of outliers by calculating the median difference, which is only 1 day.
Wrap-up
These exercises only scratched the surface of mapping in R! Here’s a summary:
- terra is the most important R package for working with rasters
plot()is useful for quickly plotting rasters, but I preferggplot2functions for “final” mappingContains numerous functions for extracting, manipulating, and analyzing raster data
Has methods for manipulating data in vector form
- ggplot2 has several functions for plotting rasters including:
geom_raster()orgeom_tile(): require a data framegeom_spatraster(): requires a SpatRaster objectgeom_stars(): uses thestarspackage, which was not covered here
Potentially useful tutorials, free books, etc.
Compilation of various resources
- R Spatial data science blogs
Books
- Geocomputation with R by R. Lovelace et al. (2022)
- Geographic Data Science with R by M.C. Wimberly (2023)
- A Crash Course in GIS using R by M. Branion-Calles (2021)
- Data Analysis and Visualization with R: Spatial by S. Sun (2023)
Tutorials/vignettes
- Spatial Data Science with R and “terra” by R. Hijmans (2019-2023)
- Geospatial Data Science in R by Zia Ahmed
- Introduction to GIS with R by Jesse Sadler
- GIS and Spatial Analysis with R by Manny Gimmond
- Vignette for sf
- R as GIS for Economists by Taro Mieno
Acknowledgements
Parts of these exercises were modified from a tutorial for the rnpn package created by Alyssa Rosemartin. Raster and observation data used for the demo were downloaded from the USA National Phenology Network’s servers using rnpn. Thanks to Roger Andre for providing helpful feedback on earlier versions of this document.